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Abstract 

We present a divide- and-conquer algorithm for parsing context-free languages efficiently. Our al- 
gorithm is an instance of Valiant's (1975), who reduced the problem of parsing to matrix multipli- 
cations. We show that, while the conquer step of Valiant's is 0(n 3 ), it improves to 0(log 2 n) under 
certain conditions satisfied by many useful inputs, and if one uses a sparse representation of matrices. 
The required conditions occur for example in program texts written by humans. The improvement 
happens because the multiplications involve an overwhelming majority of empty matrices. This result 
is relevant to modern computing: divide-and-conquer algorithms with a polylogarithmic conquer step 
can be parallelised relatively easily. 



1 Introduction 

Recent years have seen the rise of parallel computer architectures for the masses. Multicore 
CPUs and GPUs are legion. One would expect functional programs to be a perfect match 
for these architectures. Indeed, thanks to the absence of side-effects, functional programs 
are conceptually easy to parallelise. However, functional programmers have traditionally 
relied heavily on lists as the data-structure of choice. This tradition hinders the adapta- 
tion of functional programs to the age of parallelism. Indeed, the very linear structure of 
lists imposes a sequential treatment of them. In an eloquent 2009 ICFP invited talk, Guy 
Steele harangued the functional programming crowds to stop using lists and use sequences, 
represented as balanced trees. If a computation over them follows the divide-and-conquer 
skeleton, and uses an associative operator to cheaply combine intermediate results at each 
node, their fractal structure allows to take advantage of many processors in parallel; in fact 
as many as there are leaves in the tree. 

An additional benefit of the structure its ability to support incremental computation. That 
is, if one remembers the intermediate results of the computation for each node, then after 
changing a single leaf in the tree, it suffices to recompute the results for the nodes which 
are on the path from the root to the given leaf. If the tree is balanced, this means that one 
only has to run the association operator only a few times to update the result after a single 
incremental change. 

Some problems are naturally solved by divide-and-conquer algorithms. This is the case 
for example of vector operations, which treat each element independently of the others. 
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However, many problems require creativity to discover efficient divide-and-conquer solu- 
tions. This is the case of the problem of parsing context free languages. 

Valiant [1975] discovered a divide-and-conquer algorithm for context-free recognition. 
However, given Valiant's assumptions, the cost of the conquer step is cubic. This means 
that the conquer step dominates the cost of the algorithm: what we gain by running sub- 
problems in parallel is dwarfed by the cost of what we must run sequentially. Therefore 
the divide-and-conquer structure does not yield a significant performance benefit. In this 
paper, we show that on most inputs, one can carefully implement Valiant's algorithm to get 
a polylogarithmic conquer step, yielding good overall performance. 

Outline The res of the paper is organized as follows. In Sec. 2 we review the divide-and- 
conquer skeleton, how it adequately abstracts incremental and parallel computation, and its 
relationship with sequence homomorphisms. In Sec. 3 we review chart-based context-free 
parsing, and derive Valiant's algorithm from its specification. In Sec. 4 we characterize a 
sub-class of context-free languages. We argue that this class corresponds to hierarchically 
organized inputs. We proceed to show that for such languages, the average complexity of 
the conquer step of the parsing algorithm is 0{log 2 n). In Sec. 5, we describe an extension 
of context-free grammars. This extension remains parseable with Valiant's algorithm. Us- 
ing this extension, we show how to reduce parse iteration (Kleene's closure) hierarchically. 
We conclude with a discussion of our results. 



Our aim is to construct a parallel and incremental parsing algorithm. To do so, we need a 
sufficiently abstract model of incremental and parallel computation, and choose the divide 
and conquer skeleton. We further assume that the input is provided as a sequence of input 
symbols (taken in a finite alphabet E) — strings. Our definition of this skeleton relies the 
theory of sequences as initial algebras developed by Bird [1986]. 
Definition I 

A sequence-algebra is a triplet of: 

• A carrier type a 

• A constant nil of type a 

• A ternary operation bin of type a E — > a — > a. 
which satisfies the associative law: 



The type of sequences of E, written Seq, can be defined as the initial sequence-algebra. 
Concretely, one naive way to implement Seq is as a list. In actual implementations, se- 
quences will be represented by more complex data structures; perhaps trees featuring 
dynamic re-balancing such as finger trees [Hinze and Paterson, 2006]. The associative 
law (1) guarantees that re-balancing is not observable by user code. We will write Nil and 
Bin (with capitals) for the operations of the initial sequence-algebra: 

Nil: Seq 

Bin : Seq -)■ E ->• Seq -> Seq 



2 The Divide-And-Conquer Skeleton 



bin a x {bin byc)= bin {bin axb)yc 



0) 
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Assume a function /: Seq — > A. The construction of a divide-and-conquer algorithm 
computing / can be specified as finding a sequence-algebra srf = (A,nilA,binA) such as / 
is an homomorphism between Seq and srf . 

That is, we need a carrier type A, a constant nil a and a function bin a such that (1) is 
satisfied and 

nil a = /Nil 
bin A (fl)x(fr)=f(Binlxr) 

Given such an algebra srf and a sequence t, one can compute ft as the catamorphism of 
srf applied to t. 

Assuming an implementation of Seq as trees, one can obtain a parallel algorithm by 
spawning a new thread of execution at each node. In an actual implementation, the shape 
of the tree structure will be dictated by the architecture of the computer running the code. 
The implementation is free to choose the structure: any choice yields the same result, as 
guaranteed by the associative law (1). 

An incremental algorithm can be obtained by caching the intermediate results in each 
node. An update at a leaf of the tree needs to run the bin function d times, where d is the 
depth of the leaf in the tree. 

In all the cases considered in the remainder, we never bother to prove the associative law 
for the bin function that we construct. Indeed, because we consider only in values which 
are generated by the sequence-homomorphism /, associativity holds automatically. In other 
words, the fact that bin adequately implements/ implies associativity. 

Lemma 1 

Given/ : Seq — ¥ A, and bin : A — > £ — > A — > A such that 

bin(fl)x(fr)=f(Binlxr) 
then (A' ,f Nil, bin) is a sequence-algebra, where A' is the image of Seq under /. 
Proof 

The missing associative law is obtained as follows: 

bin a x {bin bye) 
= {-by A' being inverse image of / -} 

bin(fs)x(bin(ft)y(fu)) 
= {-by assumption on bin -} 

/ (Binsx (Bintyu) 
= {-by Seq being a sequence-algebra -} 

f (Bin (Binsx t)yu) 
= {-by assumption on bin -} 

bin(bin(fs)x(ft))y(fu) 
= {-by definition of a, b, c -} 

bin (bin axb)yc 



□ 
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Performance Crucially, in order for parallelisation or incrementalization to yield benefits 
in terms of performance, the cost of running bin must be at most quasilinear. Let us analyze 
why by using the following standard result: 

Theorem I {Muster Theorem, Carmen et al.) 
Assume a function T n constrained by the recurrence 

T n =aTn+f(n) 

(Such an equation will typically come from a divide-and-conquer algorithm, where a is 
the number of sub-problems at each recursive step, n/b is the size of each sub-problem, 
and f(n) is the running time of dividing up the problem space into a parts, and combining 
the sub-results together.) 

If we let e = \og b a and f(n) = 0(n c \og d n), then 

T„ = 0(n e ) ifc<e 
T n = 0(n c log d+1 n) if c = e 

T n = 0{n c \og d n) if c> e 

In our description of sequence homomorphisms we have assumed b = 2. In the case of a 
sequential algorithm, a = 2, but in presence of parallelism or incrementality, a = 1, because 
both sub-problems can be run in parallel or because the result of one sub-problem is already 
computed. In sum e = 1 corresponds to the sequential case, while e — 0 corresponds to a 
parallel or incremental case. We can then compute the asymptotic behavior of T n for each 
case: 

e=l e=0 
(sequential) (parallel) speedup factor 

c = 0 n log^n ^ 

0<c<l n n c \og d n g£ 

c=l n\og d+x n n\og d n log« 

c>\ n c \og d n n c \og d n 1 
That is, the fastest the conquer step, the bigger gains for parallelisation or incrementaliza- 
tion. In particular, a conquer step running in Q.{n l+e ) yields no asymptotic gain. 



Summary In sum, using a divide-and-conquer skeleton to construct an incremental and 
parallel algorithm computing / means finding functions bin and nil such that: 

• nil =f Nil 

. bin(fl)x(fr)=f(Binlxr) 

• The complexity of Bin is quasilinear (and if possible better) 



3 Context Free Parsing 

In this section we review the basics of context free (CF) parsing, give a specification of 
parsing in terms of transitive closure, and review the CYK and Valiant algorithms. 
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3.7 Conventions and Notations 

We assume a CF grammar <S , given by a quadruple (L,N,P,S), where £ is a finite set of 
terminals, N is a finite set of non-terminals of which S is the starting symbol, and P a finite 
set of production rules. 

We furthermore assume an input w € £* — a sequence of terminal symbols of length 
\w\. The input symbol at position i is denoted w[i]. A sub-string of w starting at position 
i (included) and ending at position j (excluded), is denoted w[i..j]. Metasyntactic vari- 
ables standing for arbitrary strings of terminals will have the form wi,W2, • • .. The letters 
A,7?,C, stand for arbitrary non-terminals, while a,/3,... stand for arbitrary strings 
(elements of (£U TV)*) and t stands for a terminal symbol. Each production rule associates 
a non-terminal with a string it can generate. We write A ::= a for A generates a. 

Definition 2 (— ►) 

aAp — >• ay/3 iff. (A ::= y) e P 
Definition 3 (-^) 

The reflexive and transitive closure of the — > relation is written — h 
Definition 4 («Sf ) 

The input string w belongs to the language Jz? iff. S w. We say that Sf generates _Sf . 



5.7.7 Chomsky Normal Form 

The simplest implementation of CYK and Valiant algorithms takes as input a grammar 
Chomsky Normal Form ([Chomsky, 1959]). In Chomsky Normal From, hereafter abbrevi- 
ated CNF, the production rules are restricted to one the following forms 

S::=£ (miliary) 
A::=t (unary) 
Ao::=AiA2 (binary) 

Any CF grammar Sf generating a language ££ can be converted to a grammar in CNF 
defining the same language ££ . This conversion preserves many useful properties of the 
input grammar. In particular: 

• The size of the grammar does not increase too much: |Sf' | < |£f | 2 . 

• The parse-trees generated by <£' are a binarised version of the parse tree generated 
from This means that from a Sf'-parse tree one can easily recover a Sf -parse tree, 
modulo the following caveat. 

• The conversion discards unit-rule cycles (such as Aq ::— A\\ A\ ::— An). This is 
good: such cycles generate infinitely many (equivalent) parse trees, which the user 
generally wants to ignore anyway. 

Hence we will assume from now on a grammar provided in CNF. Moreover, because it 
is easy to handle the empty string specially, we conventionally exclude it from the input 
language and thus exclude the nullary rule S :: = £ from the set of productions P. In sum, we 
assume that P contains only unary and binary production rules. The reader avid of details 
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is directed to Lange and LeiB [2009] for a pedagogical account of the process of reduction 
to CNF. 

Given a grammar specified as above, the problem of parsing is reduced to finding a bi- 
nary tree such that each leaf corresponds to a symbol of the input and a suitable unary rule; 
and each branch corresponds to a suitable binary rule. Essentially, parsing is equivalent to 
consider all possible bracketings of the input, and verify that they form a valid parse. 

3.2 Charts as Matrices, Parsing as Closure 

In this section we show how to specify parsing as an equation on matrices. We start by 
abstracting away from the grammar, via a ring-like structure. We define the operations 
0,+,- and das follows. 

Definition 5(0,+,- on &(N)) 

0 = 0 
x+y=xUy 
x-y = {A | An & x,A\ Gy,A ::=A 0 Ai e P} 
a, = {A\A::= w [i]eP} 

The (•) operation fully characterizes the binary production rules of the grammar, while 
a captures the unary ones. We have the following properties: (0,+) forms a commutative 
monoid (the usual monoid of sets with union); 0 is absorbing for (•); and (•) distributes 
over (+). However, and crucially, (•) is not associative. 

x + 0=x 
0 + x = x 
(x + y)+z = x+(y + z) 
x-(y + z)=x-y + x-z 
x-0 = 0 
0-x = 0 

We will then use a matrix of sets of non-terminals C to record which non-terminals can 
generate a given substring. The intention is that A e Qj iff. A — > w[i..j]. See Fig. 1 for 
an illustration. In parsing terminology, a structure containing intermediate parse results is 
called a chart. We call the set of charts c 6 '. 

Definition 6 

We lift the operations 0, + , • from sets of non-terminals to matrices of sets of nonterminals, 
in the usual manner. 

Definition 7 (0,+,- ontf) 
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Oij = 0 
(A + B) i j=A i j+B i j 
(A-B) y = £A tt -By 

k 

As expected, all the properties carry over to matrices; and associativity is still lacking. The 
operation a is used to compute an upper diagonal matrix corresponding to the input w, as 
follows. 

Definition 8 (Initial matrix) 

The initial matrix, written I(w), is a square matrix of dimension \w\ + 1 such that 

I(w) tJ = 0 ifj^i+1 

Let = I(w). Note that w//^ = d,- contains all the non-terminals which can generate 
the substring w[i..i+ 1]. Let W {2) = W W W W +I(w). It is easy to see that w£ 2 j 2 = cr, • 
Oi+i, hence it contains all the non-terminals which can generate the substring w[i..i + 2}. 
Consider now = W {2 ^ ■ +I(w). We have 

« -wP) .wP) xwP) .wP) 



w;.^ = w; 



-- (Oi • Oj+i) • C7,- +2 + Oj • (Oi+i • CJi+2) 



- yy i,i+2 ' ^i+2,i+4 

= (Oi-Oi + i)-(Oi+2-Oi+ 3 ) 

Hence contains all possible parsing of 3 symbols, and all balanced parsings of 4 
symbols. By iterating n times, one obtains all the parsings of n symbols. (However, as a 
hint to our method for efficient parsing, it suffices to repeat the process logw + 1 times to 
obtain all balanced parsings of n symbols). 

Definition 9 (Transitive closure) 

If it exists, the transitive closure of a matrix W, written W + , is the smallest matrix C such 
that 

C = C-C + W 

A consequence of the above is C D C ■ C + I(w) . It is clear by now that, consequently, every 

possible bracketing of the products I(w) I(w) is contained in C, and thus all possible 

parsings of w[i..j] are found in Qj. Conversely, because C is the smallest matrix satisfying 
the property, if Qj contains a non-terminal then it must generate Algorithms which 

parse by computing a chart are known as chart parsers. 

The above procedure specifies a recognizer: by constructing I(w) + one finds if w is 
parsable, but not the corresponding parse tree. Even though we focus on the recognition 
problem in this paper, it is straightforward to specify parsers by using matrices of parse 
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Fig. 1. Example charts. In each chart a point at position (x,y) corresponds to a substring starting at x 
and ending at y. The first parameter x grows downwards and the second one y rightwards. The input 
string w is represented by the diagonal line. Dots in the upper-right part represent nonterminals. The 
first chart witnesses A — > w{i..j] and B — > w[k..l]. An instance of the rule Z ::= XY is illustrated 
on the second chart. 

trees instead of non-terminals, and adapting the operations accordingly, as we have done 
in our implementation on top of BNFC. 

In order to construct an efficient parallel parser, we must construct a sequence-homomorphism 
from input strings to charts. Thanks to Lem. 1, it suffices find an operator bin which 
combines two charts I(w\ ) + , I(w2) + and a terminal t into a chart I(w\tW2) + ■ 

3.3 Cocke-Younger-Kasami 

A straightforward manner to turn the above specification into an algorithm is as follows. 
Let us first remark that the product of two upper triangular matrices is upper triangular. 
Hence the closure of an upper triangular matrix must also be upper triangular. Hence, 
in every chart ever considered, every element at the diagonal and below it equals zero. 
The output of any algorithm computing the closure of I(w) must satisfy the equation C = 
C -C + I(w). Expanding it index- wise yields: 

Qj=I(w) ij + jjO ik -C kj 

Because C is upper triangular, Q k = 0 if k < i and C k j = 0 if k > j. Hence the sum can be 
limited to the interval [i+ 

Q J =i(w) ij + £ C ik -C kj 
k=i+l 

Observing that the summand equals 0 when j = i + 1 and I(w)jj = 0 otherwise, we 
distinguish on that condition and obtain the two equations: 

Q, /+ i = Oi (2) 
Qj= £ C ik -C kj ifj>i+l (3) 

k=i+l 

These equations give a method to compute Qj by induction on j — i. The equations can be 
re-interpreted in term of parses and non-terminals as follows. Either 

• we parse a single token w,, and the nonterminals generating it are given directly from 
unary rules, or 
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• we parse a longer string. In this case we split it at any intermediate position k, 
and combine the intermediate results (found in 0% and Q ; ) in every possible way 
according to binary rules. 

By applying the above rules naively, the computation time is exponential in the length 
of the input; however by memoizing each intermediate result (for example by using lazy 
dynamic programming [Allison, 1992]) the complexity is merely cubic. The resulting 
dynamic programming algorithm is known as CYK, owing to its independent discoverers: 
Cocke [1969], Kasami [1965] and Younger [1967]. 

In the CYK algorithm, any element of the chart is computed only on the basis of elements 
strictly closer to the diagonal. Hence it can be used to program the combination step 
of a divide-and-conquer algorithm. The combination of two charts and a terminal C = 
bin(A,w[i],B), is defined as follows. Elements of C in the upper left corner are copied 
from A; elements of the bottom right corner are copied from B; and elements from the top 
right corner are computed using <7; and the CYK formula (Eq. (3)). 

Even though we have produced a sequence homomorphisms, it is not suitable for paral- 
lelisation: its performance is not good enough. Indeed, the above operator has to compute 
a matrix of size n x m, and computing each element takes time linear in n + m. The 
complexity of bin is therefore cubic, and as we have seen in Sec. 2, there is no asymptotic 
gain to parallelisation. 

3.4 Valiant 

A more subtle way to turn the transitive closure specification into an algorithm is the 
following. Our task is to find a function •+ which maps a matrix W to its transitive closure 
C = W + , which implies C = C ■ C + W. As above, we do so by refinement of the definition 
of transitive closure, but we adopt a divide and conquer approach rather than iterating 
indexwise. 

If W is a 1 by 1 matrix, W = 0, and the solution is C = 0. Otherwise, let us divide W and 
C in blocks as follows (for efficiency the blocks should be roughly of the same size; but 
the reasoning holds for any sizes): 

a c =[o« ; ] 

Then the condition C = C-C + W becomes 

\A' X'~\ _ \A' X'~\ \A' X'~\ |A Xl 
[o B'\~[0 B'\'[0 B'\ + [0 B\ 
Applying matrix multiplication and sum block-wise: 
A' =A'A'+A 
X' =A'X' +X'B' +X 
B' =B'B' + B 

Because A and B are smaller than W (and still upper triangular), we know how to compute 
A' and B' recursively (A' = A + , B' — B + ). There remains to find an algorithm to compute the 
top-right corner X' of the matrix. That is (renaming variables for convenience) the problem 
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Fig. 2. The recursive step of function V. The charts A and B are already complete. To complete the 
matrix X, that is, compute Y = V(A,X,B), one splits the matrices and performs 4 recursive calls. 
Each recursive call is depicted graphically. In each figure, to complete the dark-gray square, multiply 
the light-gray rectangles and add them to the dark-gray square, then do a recursive call on triangular 
matrix composed of the completed dark-gray square and the triangles. 



is reduced to finding a recursive function V which maps A, B and X to Y = V(A,X,B), 
such that Y = AY + YB + X. In terms of parsing, the function V combines the chart A of 
the first part of the input with the chart B of the second part of the input, via a partial chart 
X concerned only with strings starting in A and ending in B, and produces a full chart Y. 
Let us divide each matrix in blocks again: 



[Xn X n ] 



(Again we assume that splitting can be done; the base cases can be obtained by dropping 
the first rows and/or the second columns in the above splits.) The condition on Y then 
becomes 
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[An 


An 


[0 


A22 


\Yn 


Yn-\ 




Y 22 \ 



ill Fl2 

[Y21 Y 22 \ 

p?n B12I \X n X12] 

[ 0 B 22 \ [X 2l X 22 \ 



-FPR JFP 10 June 2014 14:55 



Efficient Parallel and Incremental Parsing of Practical Context-Free Languages 1 1 
By applying matrix multiplication and sum block-wise: 

*n = A n Y n +A n Y 2 i +Y n B n +0 + X n 
Yn = A n Y n +A l2 Y 22 + Y n B l2 + Y 12 B 22 + X l2 
Y 2i = 0 + A 22 Y 21 + Y 2l B u +0 + X 2l 
Y 22 = 0 + A 22 Y 22 + Y 2l B n + Y 22 B 22 + X 22 
By commutativity of (+) and 0 being its unit: 

Y n = A U Y U + X n + A l2 Y 2l + Y n B n 

Yn = AnYn + X l2 + A 12 Y 22 + Y n B n + Y l2 B 22 
Y 2i = A 22 Y 21 +X 2l +0 + Y 2l B n 

Y 22 = A 22 Y 22 + X 22 + Y 2l Bn + Y 22 B 22 

Because each of the sub-matrices is smaller and because of the absence of circular depen- 
dencies, Y can be computed recursively: 

Y 2l = V(A 22 ,X 2l ,B U ) 

Y U =V(An,Xn+A 12 Y 21 , B n ) 

Y 22 =V(A 22 ,X 22 + Y 2l B 12 , B 22 ) 

Y l2 =V(A U , X l2 +A l2 Y 22 + Y n B l2 , B 22 ) 

We have ignored the base cases so far because they are straightforward, except for 
the following point. When computing V(A,X,B) on matrices of dimension 1 x 1, it is 
guaranteed that A and B are equal to 0. Indeed, in that case X is just above the diagonal. 
Therefore A and B are on it and must then be 0. The result matrix is therefore equal to X. 

In sum, with the above definitions, we have the following expression for V in the recur- 
sive case 

v (\Au A l2 ] \X n X l2 ] \B n B 12 \\ = \Y U Y 12 \ 
\ [ 0 A 22 \ ' [X 2l X 22 \ ' [ 0 B 22 \ J [Y 2l Y 22 \ ■ 
In the base cases, some or all of the top and/or right sub-matrices are empty and the 
corresponding recursive calls are omitted. In terms of parsing, initially the partial chart 
X contains at the bottom-left position a single non-zero element corresponding to the 
symbol at the interface of A and B. Recursive calls progressively fill this chart, quadrant by 
quadrant. The above algorithm was first described by Valiant [1975]. A graphical summary 
is shown in Fig. 2. 

From Valiant's function V, one can construct the bin operator (completing the sequence 
homomorphism) as follows: 

0 0" 



0 0 : 
at 0 - 0 

An advantage of Valiant's algorithm over CYK is that it treats whole subcharts at once, 
via matrix-level multiplication and addition, while CYK explicitly refers to each element of 
C individually. In particular, when using a sparse-matrix representation, the multiplication 
of an empty chart with any other chart is instantaneous. The ability to handle this case 



bin(A,t,B) = ft 
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import Prelude (Eq(..)) 
class RingLike a where 

'(-I):: a ->a^ a 

data Ma = Q(Ma) (M a) (M a) (M a) \ Z\ One a 

qZZZZ=Z 

qabc d = Qabcd 

one x = ifx = zero then Z else One x 

instance (Eq a. RingLike a) -> RingLike (M a) where 

Z + x = x 
x + Z = x 

Onex+Oney = one (x + y) 
Q a n a n a 2l a 22 + Qb n bn hi b 22 
= q{a n +b n ) (a l2 + b l2 ) 
(a 2l +b 2 \) (a 22 + b 22 ) 
Z-x = Z 
x-Z = Z 

One x ■ One y = one (x ■ y) 

Qa\\ a\ 2 a 2 \ a 22 -Qb\\ b 12 b 21 b 22 

= q{a n -b n +a l2 -b 21 ) (a u ■ b 12 + a l2 ■ b 22 ) 
{a 2l -b n +a 22 -b 2l ) (a 2x -b 2l +a 22 - b 22 ) 
v::(Eq a,RingLike a)^Ma^Ma^Ma^Ma 
va Z b=Z 

vZ (Onex) Z = One x 

v (Q a n a n Z a 22 ) (Qx n x l2 x 2l x 22 ) (Q b n b l2 Z b 22 ) 

= qyn ynyu yn 

where y 2 \ = v a 22 x 2 \ b\\ 

y 22 = va 22 {x 22 + yi\-bn)b 22 
yn = van (xn + an -y22+yn - bn) b 22 
Fig. 3. Data structure for charts as sparse matrices (M), and implementation of the function V. The 
tricky parts compared to the mathematical development of Sec. 3.4 is the handling of empty matrices. 
Care must be taken to create empty matrices (Z) whenever they contain only zero elements. This 
is done by using the smart constructors q and one in matrix multiplication. The input matrices a 
and b are empty iff. the matrix x has dimension one. For concision, this implementation supports 
only matrices of size 2" for some n. It can be extended to matrices of arbitrary dimension in a 
straightforward manner by adding constructors for row and column matrices, to be used as leaves. 
An implementation supporting arbitrary matrix dimensions, as well as the optimization explained in 
Sec. 7.2 can be found in the BNFC repository: 

https : //gitmib. com/BNFC/bnf c/blob/master/source/runtime/Data/Matrix/Quad.hs 

efficiently is key: in the next section we observe that in many cases, charts are sparse, and 
composition of charts is efficient. 

When using a straightforward representation of sparse matrices as quadtrees, the im- 
plementation of Valiant's algorithm is an elegant functional program, as can be seen in 
Fig. 3. 
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Efficient Parallel and Incremental Parsing of Practical Context-Free Languages 1 3 

4 Sparse Matrix Assumption and Complexity Analysis 

4.1 Model of the Input 

In practice, matrices representing charts are expected to be sparse for large inputs, that is, a 
given substring is unlikely to be generated by a given non-terminal. Indeed, in most cases, 
the substring starts in the middle of a construction and ends in the middle of some other, 
usually unrelated other construction. This effect is illustrated in Fig. 4. In the remainder of 
the paper, we assume that inputs conform to this assumption. Before explaining where it is 
coming from, we give its formal definition. 

Definition 10 (Assumption) 

There exists a constant a such that, for any input, the distribution of non-zero elements in 
the chart C corresponding to it is bounded as follows. For any square subchart A of C above 
the diagonal, 



where #A is the number of non-zero elements in matrix A. 

We stress that the assumption involves not a grammar per se, but the language itself (i.e. 
the set of possible input strings we consider), when seen as strings generated by a given 
grammar in CNF. 

The above formula merits justification. Before using it to evaluate the complexity of 
the parsing algorithm, we will build a more precise intuition for it, by examining its 
consequences. 

Intuition based on string length Let us turn first to the interpretation of the term jjK^ . 
Recall that a non-terminal in Cy corresponds to a substring of size n = j — z in the input. 
The assumption therefore says that the probability that a substring is parseable is inversely 
proportional to the square of its size. (More precisely, when considering k random sub- 
strings of size n in a corpus of strings representative of the language, one finds on average 
that ^| of them correspond to a single nonterminal.) That is, by doubling the size of the 
substring considered, it will be four times less likely to be parsable. This corresponds 
well to intuition. Indeed, in a well-formed input, every single token can be can be given 
independent meaning. However, a larger substring in the same well formed input will likely 
start in a middle of a non-terminal (eg. in the middle of a function) and end up in the middle 
of an other, unrelated function. In an input which is organized hierarchically, it takes luck 
to pick a beginning and an end which match precisely if those are far apart. 

Experimental evidence The assumption we make is not strictly speaking verifiable ex- 
perimentally, because for any chart there exists an a such that the assumption is verified. 
However, one can gain confidence in the assumption by plotting the probability of a string 
to be parsable against its size. One should observe that this probability decreases with the 
square of the size. In practical terms, given a chart corresponding to a large input, if one 
observes a drastic cut-off in the density of non-zero elements when departing from a certain 
distance from the diagonal, then the input is compatible with our assumption. In Fig. 4, 
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Fig. 4. The chart corresponding to a fragment of a C program. The input program can be found 
in appendix. Two remarkable features merit commentary. First, the staircase shapes, which are 
explained in Sec. 5.4. Second, some small sub-matrices near the diagonal appear to be dense. These 
regions correspond to argument lists in the C program, and this iteration structure is implemented by 
linear recursion rather than our special encoding of Sec. 5. 



we show a chart corresponding to a fragment of C code, obtained using our algorithm. 
This chart, along with all other inputs for which we have run this experiment, exhibits the 
expected features. The assumption is also confirmed, albeit indirectly, by observing that 
the cost analysis which depends on it holds in practice. 



Non-suitable inputs Any input which uses nesting in linear proportion to the size of its 
input will violate our assumption. For example, the lisp program composed of n successive 
applications of cons does not satisfy our assumption. 

(cons x (cons x (. . . (cons x nil) . . .))) 

It appears however that few programs are written in this style, except perhaps for machine- 
generated ones. Linear constructions are often present, but they are then supported by 
special syntax. Indeed the above lisp program is invariably written as: 

;list v V....V) 

Hence we provide special treatment for such special iteration syntaxes. We show in Sec. 5 
how to deal with them, while respecting our assumption. 



4.2 Close and far matrices 

For simplicity we consider only inputs of sizes which are powers of 2. This additional 
assumption implies that we only need to consider square matrices in our analysis. 

We first remark that because charts are always divided in the middle, a subchart X 
considered by the algorithm is always square, and at a distance kn to the diagonal, where 
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k is some natural number and n is the size of X. When k = 0 we say that X is close to the 
diagonal and when i>0we say that X is far from the diagonal. This distinction is crucial, 
because matrices close to the diagonal have O(logn) elements in them, whereas matrix far 
away have a constant number of elements in them. This fact is not obvious, so we devote 
the present subsection to its proof. 

Definition 11 

The distance to the diagonal of a subchart is (j — i — 1) iff its bottom-most leftmost element 
has index (i,j) in the complete chart. 

Assume S{n,d) is a square sub-matrix of size n at distance d to the diagonal. 

Our assumption puts upper bounds on the number of non-zero elements in S(n, d).ln this 
section, we will compute an asymptotic upper bound of #S(n,kn), for any k. The strategy 
is to symbolically evaluate P (A), from which it is easy to infer bounds for #A, where 

P(A) = E 77^7)2 



Triangles As a stepping stone, we consider a lower triangle T(n,d), of size n and at 
distance d to the diagonal, because the above sum is then easy to evaluate symbolically. 



(Lj)eT(nA) U - l > 



« _J L _ 

t,{d+kf 



= W 0 (d + n+l)- ¥ 0 (d+l) + 
d{y\d + n+\)-y\d+\)) 

Where y is the polygamma function, which is approximated asymptotically by logarithms: 
V k (n)^^\ogn. 



Squares From the above result on triangles one can recover a result on squares: a square 
of size n is a triangle of size 2n minus two triangles of size n: 
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P{S{n,d))=P{ 
together with (4) and get 

P(S(n,kn))^2(kn + n)( 



P(T(2n,d)) -2P(T(n,n + d)) 



kn + n+l kn + 2n+l 



1 1 



) 



(4) 



\kn+l kn + 2n+lj 
-log(kn+l) + 2log(kn + n+l)- log(kn + 2n+\) 

• if k > 1, we have 

Urn P (S(n, kn)) = 2\og(k+\)- \og{k + 2) - log(fe) 

and the limit converges from below. So we the above expression is an asymptotic 
bound for P(S{n,kn))- 

• if k = 0, we have 



Summary We therefore have the following upper bounds of a square submatrix A: 

• if A is close to the diagonal, then #A < [alogw] 

• if A is far from the diagonal (its distance is kn with with k > 1), then 

#A < [a(21og(fe+ 1) - log(* + 2) - log(*))l 
Remarkably, this upper bound is independent from the size of A. 

Intuition based on balancing of trees To further support the validity of our assumption, 
we can connect the logarithmic amount of non-zero elements in a close matrix with the 
balancing factor of input trees. Consider the triangle-shaped subchart T" which touches 
the diagonal and a non-terminal A at distance k from it. We assume all symbols in the 
triangle but closer to the diagonal combine to form A. If the symbol A can be combined 
with exactly one other symbol of size fik with 0 < /3 < 1, it will yield exactly one symbol 



S(n,kn)=S(n,0) 




+ 21og(l+n)-log(l + 2n) 



- logw 
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4.3 Cost Estimation 

We will estimate the cost as the number of elementary multiplications (multiplications 
on sets of non-terminals) to be performed. All the results of this subsection assume the 
distribution of non-zero elements discussed above. 



4.3. 1 Cost of Matrix Multiplications 

We start by estimating M n , the cost of the multiplication of two square subcharts A and B 
of size n. 

Theorem 2 

The complexity of subchart multiplication M„ is 0(1) in average and O(logn) in the worst 
case. 

Proof 

We proceed by case analysis on weather the matrices are close or far from the diagonal. 
Let us write FF n for M n if both matrices are far, CF n if one is close and one is far, and CC„ 
if both are close. Let us evaluate each case: 

• FF„ = 0(1). Indeed, both #A and #B are bounded by a constant when A and B are 
far from the diagonal. 

• CC n = 0{CF n ). Indeed, when dividing a matrix close to the diagonal in four equal- 
sized blocks, only the bottom-left corner is close to the diagonal, the other ones 
are far away. The recursion for block-wise matrix multiplication then yields CC n = 
2CFn +6FFn. Because FF n is 0(1), the bound of CC n is then CF n . 

• CF„ = 0(1). Let us assume A close and B far away. Let Z? i; for i,j £ {1,2} be the 
submatrices of the far matrix, B. After a finite number of recursion steps, there is 
at most a single element in B. Therefore we can assume #5=1, without loss of 
generality. We can then weigh the cost of each recursive call by #Bjf. 

CF„ = #B n FFn + #B 21 FFn + #B 12 FFn + #B 22 FFn 

+ #B n CFn +#B 2l FF» +#B l2 CFn + #B 22 FFn 

= rCFn+0{\) 

Where r = #Bi\+#B\ 2 , and is 1 if the element of the matrix B is in its upper part, 
and 0 otherwise. In the worst case, r — 1, and the solving the recurrence using the 
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Master Theorem (Th. 1) gives CF n = 0(log"). In the average case we can assume an 
even distribution of the non-zero element in B, which implies r = 1/2. The solution 
of the recurrence is therefore CF n = 0(1). 



One might raise the following objection to the assumption of even average distribution of 
elements: because inputs given to a parser are generally valid, the top-rightmost element 
will be non-zero, as well as many elements on the top row, and many elements on the right 
column, violating the assumption. The refutation is the following: the topmost matrices, 
with a skewed distribution towards the top, are only involved on the left-hand-side of 
multiplications, for which we have no assumption of evenness. (Symmetrically, rightmost 
matrices are only involved on the right-hand- side of multiplications, and topmost rightmost 
matrices are not involved in any multiplication at all.) 

Another way to understand that the unevenness does not hurt is to consider the following 
randomized variant of the algorithm. One artificially multiplies the size of the input by 
two, and randomize the position of the actual input inside it. This randomization makes the 
distribution of elements even with respect to subchart boundaries, and at worst multiplies 
the total cost by a constant. 

4.3.2 Cost of the Conquer Step 
We proceed to estimate the running cost V„ of the valiant function V on a matrix of size n. 



The complexity of the V function is <9(logw) on average and (3(log 2 w) in the worst case. 
Proof 

We will compute the number of matrix multiplications performed; the worst case complex- 
ity is obtained merely by multiplying by a logra factor. 

We assume that we know the resulting chart Y = V(A,X,B). That is, V„ maps Y to the 
cost of running V(A,X,B). We have the following recurrence: 



□ 



Theorem 3 



V„(0)=0 




+ Mn(A U ,Y 21 )+Mn(Y 2l ,B l2 ) 

+Mn(A l2 ,Y22)+Mn(Yn,B 12 ) 



Because A and B are upper-triangular matrices, the subcharts A\ 2 and B\ 2 are close to the 
diagonal. We distinguish two cases: either Y is close or far from the diagonal. In the former 
case we let V„ — F n and in the latter case V„=C„. 
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Y far All sub-matrices of Y are far from the diagonal. The recurrence specializes then to: 

F„(0) =0 
F l (Y) = l 

Fn \jL £] =F i (Yn)+F i (Yn)+Fn(Y 21 )+F»(Y 22 ) 
+ 0(1) 

Because Y has a constant number of non-zero elements, a fortiori so has X, therefore most 
recursive calls will return immediately, and on average only one recursive call needs to be 
counted. We thus have 

F n =Fn+0{\) 

Hence we use the Master Theorem with a = l,b = 2 and f(n) = 1. We are therefore in the 
case c = e, and obtain F n = O(logn). 

Y close Out of the four submatrices of Y, Y 2 \ is close to the diagonal and the other three 
are far from it. Therefore the recurrence specializes to: 

Ci = l 

C n =Cn+3Fn+0(l) 

= Cn+0(\ogn) 

We use the Master Theorem with a=l,b = 2 and f(n) = 0(log«). We are in the case 
c = e, and obtain C„ = O(\og 2 n) . □ 

4.3.3 Total Cost 

We can proceed to compute the total cost of our algorithm T„ on an input string of size 
n = \w\. Again, we use the Master Theorem. We divide the input into two parts, so b = 2. 
We assume that the input is already provided as a balanced tree representing the matrix 
I(w), and so the cost of the divide step is zero. Therefore f(n) is the cost of the conquer 
step only. This step involves a matrix close to the diagonal, so f(n) =C„ = 0(log d n), and 
in turn c — 0. The constant d is 2 if one considers the average case or 3 in the worst case. 

T n =aTn+0(log d n) 

• If we assume a sequential execution of the two sub-problems then we have a = 2. In 
turn, e= 1 and T(n) = 0(n). 

• If we assume perfect parallelisation of sub-problems, or an incremental situation, 
where one of the sub-solution can be reused, then a = 1. In turn, e = 0 and T(n) = 
0(log d+1 n). 

Valiant's evaluation (1974) for V„ is 0(n r ), for some / between 2 and 3 (the exact value 
depends on the matrix multiplication algorithm used). In his case c = 7 and d = 0, yielding 
T(n) = 0(n r ), whatever the value of a. That is, according to Valiant's analysis, making an 
incremental or parallel version of his algorithm would lead no benefit, while our analysis 
reveals that a big payoff is at hand for usual inputs. 
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200 




Fig. 5. Running time of the V function in function of the size of the input, using semi-logarithmic 
scale. The grammar is that corresponding to the encoding of t* using the technique described in 
Sec. 5. The next data point (input size 2 23 ) could not be obtained due to running out of memory. The 
curve is the graph of a quadratic function which fits the measurements. 

4.3.4 Experiments 

We have conducted two sets of experiments on the running time of the algorithm. All 
timings were obtained using the CRITERION library [O'Sullivan, 2013], on an Intel Core 
2 at 2.13GHz. All programs were compiled with GHC 7.6.1. In the first set, we have 
measured the performance on a practical language on practical inputs, to confirm that the 
function is fast enough to use as an incremental parser in an interactive setting. To do so, 
we have run our BNFC implementation on a C grammar to produce the a and (•) func- 
tions, and tested the running time of the V function on a large C program, extracted from 
the Linux kernel scheduler (https://github.com/torvalds/linux/blob/master/ 
kernel/sched/core . c — preprocessor directives as well as typedef s found in it were 
expanded by hand.) The input was divided into a left part and a right part of equal sizes, and 
a middle symbol. The complete charts for the left and the right part were computed, then 
we measured the time of the V function on the charts and the singleton chart containing 
the middle symbol. After collecting 100 samples, CRITERION reported a mean runtime 
of 320.1469 ;tts, with a standard deviation of 23.06691 jUs. This is well within acceptable 
limits for interactive use: most people cannot perceive a delay less than a millisecond. 

In the second set of experiments, we tested the V function on generated inputs of various 
sizes, to confirm our calculation of the worst case running time. The grammar is that 
corresponding to the encoding of t* (the nonterminal t repeated an arbitrary number of 
times) using the technique described in Sec. 5 (which ensures that our assumption is 
verified with a close to 1). The inputs were a repetition of that terminal symbol. The results 
are shown in Fig. 5. We observe that the measurements, when drawn on a semi-logarithmic 
scale, fit a quadratic curve; which agrees with the theoretical cost estimation. 
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5 Iteration in Context-Free Grammars 

5.7 The Problem With Iteration 

While we have worked hard to ensure the efficient handling of the non-associative aspect 
of CF parsing, we have neglected so far that most CF languages feature regular iteration; 
that is, associative concatenation rules. Without special treatment, such associative rules 
cause severe inefficiencies in the algorithm as presented so far. 

Iteration is technically known as Kleene closure, and is written here as a postfix star 
(*). In context-free grammars, it can be (and usually is) encoded as either as left or right 
recursion. For example a rule A ::= Y* is typically encoded as follows. 

A::=e 
A ::=AY 

The problem with this encoding is two-fold. First, inputs consisting mostly of a sequence 
of Y necessarily violate our assumption on inputs: the depth of the parse tree grows linearly 
with the size of the input. tod 

Second, the generated AST will necessarily be linear. Consequently, as we have seen in 
the introduction, this linear shape would preclude efficient parallel or incremental process- 
ing of the AST by computations consuming it. 

One could possibly imagine working around the first problem with creative algorithmic 
devices. However it is clear that the second problem is intrinsic to the encoding of iteration 
as linear recursion. Hence we take the stance that special support for iteration is necessary 
in any parallel or incremental parser. 



5.2 Towards an Efficient Encoding 

Instead of a linear, unary encoding of iterations, one can attempt a binary tree encoding. 
One might propose the following encoding: 

A ::=AA 
A ::=Y 

However this encoding accepts all possible associations of sequences of Ts, in particular 
also linear ones. One might attempt to mend the rules by using a more clever encoding, 
say: 

At+i ::=A k A k 

Ignoring that it codes only lists of size 2" for some n, our second condition on inputs is still 
be violated. Indeed, in a sequence of Y, any subsequence of length 2" for some n would be 
recognized. This means that there would be a lot of overlap between possible parse trees. 

In the remainder of the section we describe a way to keep the rule A ::= AA, but tweak 
the parsing algorithm so that for any sequence of Ys only a single association is considered. 



5.3 Oracle-Sensitive Parsing 
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A 2 ' A 3 ' A 4 ' 



Fig. 6. Example chart for the grammar A% + \ ::= A^A^ 



Fig. 7. Matching a list using the oracle-sensitive algorithm. We assume that only one non-terminal 
Y is involved and thus show only the bit-tags. Considering only the non-terminals which cannot 
be combined using the rule Y ::= Y°Y 1 , the charts features a sequence of Y l (of increasing size), 
followed by a sequence of Y° (of decreasing size). 

Overview Each nonterminal will come with a bit indicating whether it should be used 
either as a left or right-child in the parse tree. The bit will be chosen by an oracle upon 
reduction of the nonterminal, so that the tree will be balanced. We write Y b for the non- 
terminal Y annotated with bit b. The main rule constructing trees is then written: 

Y ::=Y°Y 1 

This restricts which trees are explored. After parsing with this rule, we obtain a sequence 
of F 1 (unmatched right children) of growing size followed by a sequence of Yq (unmatched 
left children), as depicted in Fig. 7. These nodes will then be collected using special rules. 
Assuming that Co and Do delimit the list of non-terminals Y*, the collecting rules would 
be written: 



C::=C 0 D::=D 0 
::=CY l ::=Y 0 D 
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And the final list can be produced by the rule L ::= CD. 

The delimiters Co and Do are necessary so that only one collection of Y 1 and only one 
collection of Y° are needed; thereby ensuring a good performance. Without delimiters, ev- 
ery combination of sequences of Y l and Y° would need to be considered. An intermediate 
situation is where only one delimiter is present, say the opening one. In that case, only one 
list of Y 1 is considered, but many sequences of Y° would be considered. 

Oracle-Sensitive Grammar Formalism In general, we extend productions so that non- 
terminals on a right-hand-side are tagged with a bit. Formally, we extend the syntax of the 
productions as follows, where b\,b2,--- range over bits: 

• A ::=B^C b2 

• A::=t, for? eS 

We allow, as a shorthand, to write non-annotated non-terminals in the right-hand-side 
of a production rule. The production then stands for a pair of productions with either 
annotation. That is A ::= a^Bai is a shorthand for the pair of rules A ::= OoB°ai and 
A ::= aofi'ai- 

Algorithm The implementation takes a grammar written using a special construction for 
iteration and translate it to the above formalism appropriately. The algorithmic part of the 
parsing procedure remains the same as previously. The part which changes is the operators 
generating and combining non-terminals, as follows. 

Definition 12 

G i = {A»\A::=w[i\€P} 
x-y = {A b \B bl Ex,& 2 ey,A::=B bl C h2 6 P} 
where the output bit b comes from the oracle. 

The transitive closure function of I(w) modified to use the above version of the (•) operator 
is called T p in the remainder. 

Formalization and proof We proceed to prove that the above implementation indeed 
recognizes the intended language. But first, we must define the meaning of our extended 
grammar formalism and show that it corresponds to our needs. 

The main issue is that the algorithm behaves non-deterministically, in the sense that the 
grammar-writer does not have access to the bits generated by the oracle. The rest of the 
section is structured as follows: 

1 . we define a generation relation restricted to a given source of bits p, which represents 
the oracle; 

2. we show that the algorithm decides the above relation for a specific (but intangible) 

p; 

3. we narrow the acceptable grammars to those which are oblivious to p (describe 
languages independent of p); 
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4. we provide a toolkit which enables to identify and construct such oblivious gram- 

5. and finally we show that our encoding of iteration preserves obliviousness. 

Oracle We define a new generation relation indexed by a stream of bits p. This 
stream of bits wholly models the oracle. 

The meaning of production rules annotated with bits can then be given. We first define a 
1-step generation relation indexed by a single bit. 

Definition 13 [bit-indexed generation) 

• if (A :: = B b ^C h2 ) e P, then w Q A b a ^ w^C^a 

• if (A ::= x) G P, then woA b a i — > w Q xa 

Crucially, the rules require the relation to act on the first nonterminal in a string. This forces 
the bit-stream p to be used in a deterministic way. Otherwise, the relation could use each 
bit of p in a arbitrary place, essentially bypassing the instructions of the oracle transmitted 
via the bitstream p. 

Definition 14 (stream-indexed generation) 

The relation a h^-> w is inductively defined as follows. 

• If a i — ^ 7 and y ^ w then a i-A- w 

Algorithm The algorithm decides the i — > relation, but only for one particular bit-stream 
p (which the grammar- writer has no control over). 

Theorem 4 

For every p,A h ^ w i} iff A b E T p (w) ij 
Proof 

By induction on the decomposition structure of the matrix (done by T). □ 

Obliviousness Ultimately, we do not want the language defined using our formalism to 
depend on the actual stream p of bits generated by the oracle, because this is out of the 
control of the grammar writer. That is, if a string is generated using some p, it should be 
generated with every p. 

We first remark that the set of strings generated by any given tagged non-terminal always 
depends on p. Hence instead we have to consider the strings generated by sets of non- 
terminals (and in general sets of strings). We thus define the following relations, using T, 
A and S to range over sets of strings. 

Definition 15 

• r^»wiff. 3p.3aeT. a^w 

• fAw iff. Vp.3a er.aAw 



Definition 16 
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A set of strings T is called oracle-oblivious if the set of strings of terminals generated by it 
is insensitive to non-determinism; that is, for any wo, if T i — > wo then T i — > wq. 
Definition 17 

We note A the set {A 0 , A 1 }. 
Definition 18 {well-formed grammar) 

An oracle- sensitive grammar is well-formed if § is oracle-oblivious. 

We can then show that obliviousness fulfills its purpose: the sensitivity to p introduced 
in the algorithm is indeed hidden by obliviousness. 

Theorem 5 

If A is oracle-oblivious then 

A Wij iff 3pA h e T p (w)ij, for some bit b 

Proof 

left-to-right direction By definition, A i — > Wjj implies in particular that there exists a p 
and a b such that A b h-^-> w;/. Th. 4 yields the desired conclusion. 

right-to-left direction Because of the obliviousness of A it suffices to prove that 3pA h e 
Tp(w)jj, for some bit b implies 3p.3bA h i — > Wij. Again, Th. 4, in the right-to-left di- 
rection, yields the desired conclusion. 
□ 

A kit for well-formed grammars Given a grammar definition using bit-annotations ar- 
bitrarily, it is hard to decide whether it is well-formed. Hence we define the following 
relation, which enables us to reason about obliviousness compositionally. 

Definition 19 

r => A iff for every wo, 

• if r wo then A wo. 

• if A i — > wo then T i — > wq. 

The above relation is constructed to transport obliviousness: 
Lemma 2 

If r A and A is oracle oblivious, then so is T. 
Proof 

Direct consequence of the definition. □ 

Lemma 3 

1 . => is reflexive and transitive 

2. If T =U- A then TE AS and EF ZA 

3. Assume a non-terminal A and T its set of productions. Then A => T. 
Proof 

1. and 3. are a direct consequences of the definitions. The proof of 2. is tedious but 
straightforward, and similar in style to the proof of Lem. 4 and thus omitted. □ 
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The above lemma means that, if productions are written without bit annotations (they 
generate all possible annotations), then they preserve obliviousness. Hence, a grammar 
written without annotations is necessarily well formed. Because our encoding of iteration 
also preserves obliviousness, this in turn means that, if one uses annotations only to encode 
iteration in the pattern we prescribe, the grammar is then well-formed. 

Encoding iteration As a reminder, we encode L ::= CqYq*Dq, as 

Y::=Y 0 
::=y o y i 

C::=C 0 

::=CY l 
D::=D Q 

::=Y°D 
L::=CD 

Theorem 6 
L =!» C 0 Y*D 0 
Proof 

We construct the relation in the following stages. 
1. L 

3. C 0 Y*D 0 

4. C 0 ?o*5 0 

Lem. 3. gives the relation between 1 and 2 and between 3 and 4. Only the step between 2 
and 3 requires special treatment: it depends on the relation 

{Y l }*{Y 0 }* ^Y* 

Proving it requires two preservation lemmas for every wo: 

. if {F 1 }*^ 0 }* A wo then?* A w 0 . 
• iff* Aw 0 then{y 1 }*{y°}* Aw 0 . 

The first one is an easy consequence of the ability to chose any possible p in the h^j> 
relation. The second one is the angular stone of our method, and is proved in the following 
lemma. □ 

Lemma 4 

Let w G E* and a e Y*. If a A w then there exists j8 £ {Y 1 }* and 7 G {Y 0 }* such that 

/3 7 Aw. 

Proof 

By induction on the length of a. If a is in the required form, we have the result. If not, 
then the subsequence Y°Y l can be found at least once in a: 

a = a< ) Y 0 Y l ai 
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We can decompose w into two parts wo and w\ such that 

v 

CCq I > WQ 

But, for any b, we have Y b a x ^ Y°Y* a x . Therefore, Y b <Xi w x and in turn aaY h ai ^ 

We can then use the induction hypothesis on OoY b a\ to obtain and y satisfying the 
conditions of the theorem. □ 

5.4 Performance 

The above encoding yields good performance in practice, even with a naive implementation 
of the oracle providing the stream of bits p, which does not produce perfectly balanced 
trees. Indeed, Fig. 7 shows the chart generated from a sample C program. It exhibits the 
drastic cut-off in non-zero node density formalized in Def. 10, except for a few linear 
shapes, as one can observe. These are caused by our implementation of the oracle, which 
is naive. In our implementation, the bit which is generated is a parameter of the function V, 
and it is flipped (deterministically) for some recursive calls. This means that, inside a given 
subchart, all instances of associative rules either right-associate or left-associate, yielding 
a linear arrangement of results in the chart. Yet, this strategy for bit generation is the best 
we have found with respect to observed performance. The reason might be that more even 
distributions of results in the chart worsens the locality of non-zero data, yielding smaller 
zero subcharts. 

6 Related Work 

6.1 Our Own Previous Work 

Claessen [2004] wrote a paper titled "parallel parsing processes", but which has only 
tenuous connections with the present work. The paper of 2004 presents a parsing tech- 
nique based on usual sequential parsers, but where disjunction is represented by processes 
running concurrently. An advantage of that technique is that the parser processes the input 
string in chunks that can be discarded as soon as the parser has analyzed them. 

Bernardy [2009] has shown how to combine the above idea with the online parsers 
of Hughes and Swierstra [2003]. This makes the resulting parsing algorithm suitable for 
incremental parsing in an editing environment such as Yi [Bernardy, 2008]. However the 
method is brittle, because grammars need to be expressed in a special-purpose formalism, 
and error-correction must be "bake-in" the grammar. In contrast, the method presented here 
accepts grammar in Backus-Naur Form (see Sec. 7.6); only iterative structures need to be 
changed to use the special construction of Sec. 5. One does not have to worry about error 
recovery because all substrings are parsed. 

The present work was presented, in a draft version, at ICFP [Bernardy and Claessen, 
2013]. Besides correcting several minor mistakes and improving the presentation, the present 
version gives a better analysis of the complexity of the parsing algorithm: we show that the 
algorithm is asymptotically faster by a factor of logn in the average case. 
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6.2 Special Support for iteration 

The assumption we make on inputs, which is tied to the balancing of the parse trees is 
partially inspired by work by Wagner and Graham [1998]. They show that linear parse trees 
cannot be handled efficiently (in parallel or incrementally), because updating a structure 
requires time proportional to its depth. Wagner and Graham then deduce that efficient 
incremental parsing requires a special purpose support for iteration, as we have done in 
Sec. 5. 

6.3 General CF Parsing 

Perhaps the most well known method for parsing general CF languages is that of Tomita 
[1986]. This method has in common with ours that it achieves linear performance on well- 
behaved inputs, while degrading gracefully to the best possible performance (cubic) in the 
worst case. 

The main difference between the methods is that Tomita's algorithm processes the input 
sequentially, while we can process it any bottom-up order. This means that the condition for 
well-behaved inputs is different for either methods. In Tomita's case, the condition is that, 
at any point during the parsing, the amount of ambiguity is small (bound by a constant), 
implying that the next action of the parser is most of the time determined by the next 
symbol in the input. In our case, it is captured by Def. 10, which essentially means that 
the input should be hierarchical. Tomita's condition does not imply ours: linearly arranged 
inputs can be deterministic. Checking the other implication is left for future work. It is 
not easy to conclude: our condition imposes non-local conditions which may or may not 
restrict non-determinism in a linear processing of the input. 

The chief advantage of our method is its divide-and-conquer structure, which means that 
is can be used in a standard parallel or incremental framework. Tomita inherits essential 
use of the sequential processing of the input from LR parsing, making his technique not 
amenable to parallelisation. 

6.4 Parallel Parsing 

There is a wealth of previous work devoted to efficient recognition and parsing of context- 
free languages on abstract parallel machines, so much that a comprehensive survey of the 
field is out of the scope of this paper. The situation can however be summarized as follows: 
to the best of our knowledge, before this work, algorithms proposed for parallel parsing 
either need an unrealistic number of processors, or they target a language class which is 
too restrictive to be of practical interest. 

Too many processors Sikkel and Nijholt [1997] describe a parallel algorithm (in section 
6.3) which can recognize a string of length n in <9(logw) time, but it requires 0(n 6 ) 
processors in the worst case. 

A line of work involving Rytter gives a dozen of complexity results for various sub- 
classes of CF and various abstract machines. The most closely related results are perhaps 
the following. 
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Chytil et al. [1991] present a simple parallel algorithm recognizing unambiguous context- 
free languages on a CREW PRAM in time log 2 n with only n 3 processors. The similarity 
with our work is that the authors restrict the languages they accept to a well-behaved subset 
of CF to obtain sensible running time. In our opinion the present work captures better the 
actual sets of inputs found in the actual practice of CF parsing. 

Too restrictive grammars Rytter and Giancarlo [1987] analyze an algorithm which can 
parse a bracket grammar in O(logn) time and 0(n/\ogn) processors. This is fast and does 
not use too many processors, but is restricted to languages where the grouping of non- 
terminals is completely explicit in the input: each production rule starts with an opening 
bracket and ends with a closing bracket. 

6.5 Automatic Parallelisation 

Gibbons [1996] (following the work of Bird [1986]) states that if a function can be ex- 
pressed both as a leftwards and rightwards function (fold I and foldr), then it can also be 
expressed as a sequence homomorphism. Morita et al. [2007] use this theorem to derive 
such sequence homomorphism algorithmically. They present a tool which can produce a 
sequence homomorphism when given functions expressed both as foldl and foldr. 

It would be interesting to check if the method could derive an efficient parallel parsing 
algorithm. As far as we understand, the method might (possibly with extensions) be able to 
discover the Valiant algorithm from a leftwards and a rightwards C YK algorithm. However, 
we think that discovering the interest of a sparse matrix representation out of reach: it 
requires a creative step which is hard to capture in an automatic tool. 

Mainstream parsing algorithms (such as LL(k) or LALR(k)) also seem hard to parallelise 
using an automatic method. First, it is not clear how one can reverse such a parser, because 
the definition of the algorithm is tightly coupled with direction of parsing (as their name 
indicates). Second, Morita et al. [2007] do not give an upper bound on the efficiency of the 
generated combination operator {bin), but only measure the performance of the generated 
code on a number of examples. As we understand there may be situations where the method 
produces an associative operator of linear (or worse) complexity, thereby yielding modest 
parallelisation gains (if any). 

6.6 Simultaneous Incremental and Parallel Computation 

Burckhardt et al. [201 1] propose a model of computation which captures both incremental 
and parallel execution. Their model is based on concurrently running tasks which commit 
their results atomically upon completion. Our work is instead based on the well-known 
sequence homomorphism as model of parallel and incremental computation. 

7 Discussion 

7.1 Destructive Updates 

We were tempted to solve the problem of iteration by using destructive updates. That is, to 
make associative rules such as Y ::= YY consume their arguments. That is, when a Y non- 
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terminal is added to the chart using the above rule, the two Y non-terminals that compose 
it would be removed. We have attempted this solution, but faced a couple of issues, which 
will not surprise an audience of functional programmers. 

• On the theoretical side, reasoning about parsing with destructive updates of the chart 
has proven intractable. The generation relation describing which strings are recog- 
nized by such a parser is hard to define, let alone reason about. A major difficulty 
is to combine destructive updates with a notion of non-determinism similar to that 
described in Sec. 5. Indeed, the user has no control on which particular consuming 
rule will fire first, because the order depends on the particular of the implementation 
of Valiant's algorithm (the order in which matrix multiplications are run, etc.) and 
the exact positioning of the substrings. 

• On the practical side, the presence of updates makes for a more complicated imple- 
mentation. It would also mean to abandon (so far unexploited) parallel opportunities 
in the matrix multiplication and the V function. 



7.2 Optimization 

In many grammars, a fair proportion of non-terminals occur only either on the left, or on 
the right of binary productions. Assume for example that A only ever occurs on the left. It 
is wasteful in this case to consider A for right-combinations, as does the algorithm we have 
presented so far. 

This optimization is available to many CF parsing algorithms, but it is especially useful 
to us, because it acts in synergy with the detection of empty matrices. Indeed, by having 
separate matrices of left-combinable and right-combinable non-terminals, each matrix be- 
comes sparser. This means that some combinations can be discarded in blocks, that is, at 
the level of matrices instead at the level of individual non-terminals. 

An additional benefit of this optimization is that it pays for the cost of tagging non- 
terminals with an extra bit, as we describe in Sec. 5. Indeed, 0-tagged non-terminals occur 
only on the left of binary productions, and 1 -tagged non-terminals occur only on the right 
in our encoding of iteration. Therefore this optimization eliminates all the cost of tagging: 
instead of tagging a non-terminal with a bit, it suffice to insert it only in the relevant matrix. 



7.3 Implementation 

An implementation of the parsing method presented here, including special support for 
iteration as presented in Sec. 5 and the optimization presented above, is implemented 
as a new back-end for the BNFC tool, [Forsberg and Ranta, 2012] available in version 
2.6, licensed under the GPL [Free Software Foundation, 1991]. The tool takes a grammar 
in BNF with annotations for efficient repetition. When running the tool with the option 
— cnf , it produces a Haskell implementation of CNF tables and an instance of the Valiant's 
algorithm using it. As other BNFC back-ends, our implementation produces full parsers, 
not mere recognizers. 
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7.4 Unexploited Parallelism 

The parallelisation that we suggest can take advantage of at most a number of processors 
proportional to the length of the input. When parsing using Valiant's algorithm, there is 
more parallelism to take advantage of (for example two of the recursive calls in the V 
function are independent from each other). However, running in parallel all recursive calls 
to V would require asymptotically more processors than the length of the input. We do 
believe that this is not a reasonable assumption to make when parsing a whole input. 
However, in the case of incremental parsing, where only a tiny fraction of the input will be 
re-parsed, one might want to take advantage of such extra parallelism opportunities. 

7.5 Unexploited Incrementality 

We have suggested that the incremental version of the parser should run the V function 
<3(logw) times when changing one symbol in the input. In fact, it might be possible to use 
a better implementation of the chart data structure, which would support an incremental 
update with a single run of the V function. Indeed, when changing a single symbol of the 
input, only the part of the chart which depends on that symbol (the square whose bottom- 
left corner is the symbol in question) needs to be recomputed. This improved re-use of 
results is left for future work. 

7.6 Chomsky Normal-Form 

Even though we assume that we transform the grammar to CNF for ease of presentation, 
this is not actually the best form to use in an implementation. In fact, it is better to convert 
the grammar to 2NF (where productions have at most 2 symbols) and derive the operations 
(•) and a using a slightly modified algorithm, using the method described by Lange and 
LeiB [2009], as we have done in our implementation. 

The conversion from Backus-Naur Form (BNF) to CNF (or 2NF) involves a division 
of long productions into binary ones. This is usually done by chaining the binary rules 
linearly. If the productions of the input grammar are long, this impacts negatively the 
performance of our algorithm, which performs best on balanced inputs. Fortunately it is 
not difficult to divide long productions into a balanced tree of binary rules. 

The CNF grammar is suitable not only for recognition of languages, but also for parsing: 
the parse trees obtained by the converted grammar are essentially a binarization of the trees 
obtained by the grammar in BNF. The aspect which cannot be preserved by the conversion 
is the presence of cycles of unit rules. However, the elimination of such cycles can only be 
seen as a benefit: they introduce an unbounded amount of ambiguity in the grammar, and 
are a symptom of a mistake in the grammar specification. 

7.7 A New Class of Languages 

The assumption we make on the input (depending on a constant a), defines implicitly a 
new class of languages. The class lies between regular and context-free languages. We 
call the class a-balanced context-free languages, or BCF(a). The use of the parameter a 
contrasts with that of the parameter k in classes such as LL(k) or LR(fc). While LL(£) or 
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LR(k) restricts the form that a CF grammar can take, BCF(a) does not. Instead, it restricts 
the strings of the languages. 

We have found that for a given grammar, programs are written with a shallow nesting 
structure, instead of a deep one (with the exception of regular iteration) and hence we have 
anecdotal evidence that any given programming language is a member of BCF(a), if we 
consider the language as the set of strings actually written in it by programmers. Together 
with observation that the parsing problem for BCF(a) has lower computational complexity 
than that of general-context free languages, this makes BCF(a) worthy of study. 

In fact, because the assumption we make is not one which is enforced by usual CF 
grammars, but we still observe it to hold in practice, it must mean that the assumption is 
self-imposed by the writers of these inputs, namely programmers. This is not too surprising, 
as our assumption can be violated only by programs which exhibit an amount of nesting 
comparable to the total length of the input. As folklore goes, programmers are adverse 
to deeply-nested constructions. Indeed, understanding a program with n levels of nesting 
requires to remember n levels of context. The link between the ability for a computer 
to efficiently parse an input in parallel and incrementally and for a human to do so is 
intriguing, and we hope that the present paper sheds an interesting light on it. 



7.8 Generalization 

The body of the paper does not depend on the particulars of CF recognition: we abstract 
over it via an arbitrary association operator. This means that other applications can be 
devised. A natural extension is to support CF parsing, as we have done in our imple- 
mentation. More exotic extensions are also possible. A first example would be to support 
symbol tables, which are for example necessary for proper parsing of C. In this extension, 
non-terminals would be associated with two symbol sets, one that they assume comes 
from the environment and one which they provide to the environment. The combination 
operator would reconcile these two sets. A second example is stochastic parsing. Here, 
a probability would be associated with each non-terminal and production rule, and the 
association operator would simply multiply the probabilities. 

In fact, our method can be seen as a general way to turn a non-associative operator into 
an associative one by computing all possible associations. The efficiency is recovered by 
the ability to filter out most of the results; either because the original operator discards 
them, or because there is (possibly hidden) associativity which can be taken advantage of. 

Yet another generalization of Valiant's algorithm produces a parser for Boolean gram- 
mars, as recently shown by Okhotin [2014]. Boolean grammars allow to define the gen- 
eration of non-terminals not only by union of production rules, but also intersection and 
complement. They can characterize non context-free languages, such as {a"b n c" \ n e N}. 
In this case, the ring-like structure that we have used is not sufficient: one must apply a 
Boolean function to all possible combination of non-terminals before obtaining the parses 
of a given substring. 
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7.9 The Old as New 

It strikes us that a parsing algorithm published in 1975 finds an application in the area 
of parallelisation for computer architectures of the 2010 decade. Further, Valiant gives no 
indication that the algorithm described should find any practical parsing application. As 
it seems, he aims only to tie the complexity of context-free recognition to that of matrix 
multiplication (via the transitive closure operation). 

Indeed, in the case of parsing (in contrast to mere recognition), subtraction of matrices 
is not defined. Hence one cannot use the efficient Strassen algorithm [Strassen, 1969] for 
multiplication, and in turn the complexity of general context-free parsing using Valiant's 
method is cubic, and fails to beat the simpler CYK algorithm. 

Our contribution is to recognize that Valiant's algorithm performs well for parsing prac- 
tical inputs, given a special handling of iteration and a sparse matrix representation (even 
when using the naive matrix multiplication algorithm). If we also account for the ease of 
making parallel and incremental implementations of the algorithm thanks to its divide and 
conquer structure, we must classify Valiant's algorithm as a practical method of parsing. 

In fact, Valiant's algorithm offers such a combination of simplicity and performance that 
we believe it deserves a prominent place in textbooks, on par with LALR algorithms. 



8 Conclusions 

At the start of this work, we set out to find an associative operator with sub-linear com- 
plexity that could be used to implement a divide-and-conquer algorithm for parsing. The 
goal was to obtain a parallelizable parsing algorithm that would double as an incremental 
parsing algorithm. We managed to find such an operator, but the desired complexity only 
holds under certain assumptions that luckily do seem to hold in practice. The conditions 
hold when the recursive nesting depth of a program text only grows, say logarithmically 
in terms of the total length of the program. An unanticipated result of our work is thus 
the definition of a new class of languages. We were also forced to come up with a special 
way of dealing with iteration (frequently occurring in grammars) so it would not break this 
practical assumption. 
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Appendix: C Program Fragment 




Fragment of a C program corresponding to the chart in Fig. 4. It is excerpt by hand from the 
linux kernel scheduler (beginning of the file https : //github . com/torvalds/linux/ 
blob/master/kernel/sched/core . c) 
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